CHAOS.NET[DLN,MRC] - www.SailDart.org

perm filename CHAOS.NET[DLN,MRC] blob sn#311218 filedate 1977-10-20 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00021 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002					CHAOS	ORDER
C00005 00003	>Goals
C00010 00004	>User appearance
C00019 00005	>Network Control Program
C00025 00006	>>Hosts attached to more than one subnet
C00029 00007	>>Packet numbers
C00031 00008	>>Indices
C00037 00009	>>Operations
C00054 00010	>>Table of packet field use vs opcode
C00056 00011	>>Flow and Error Control
C00070 00012	>>Dynamic Window-size Adjustment
C00085 00013	>>Uniquization
C00089 00014	>>Media handlers
C00094 00015	>>Buffers
C00100 00016	>>ITS System Calls
C00114 00017	---------- Material after this point may be inoperative ----------
C00131 00018	>Transmission Media
C00143 00019	>>DL10 & DTE20
C00144 00020	>>Asynchronous line
C00146 00021	>Higher-Level Protocols
C00147 ENDMK
C⊗;
				CHAOS	ORDER

		**** DRAFT ****
NOTES:
Work more on dynamic  flow control, see end  of that section.  Data  grams
seem to have a bug  that the two parties can  never agree on whether  they
both agree that  it really happened.   Am I losing?   Flush cruft at  end?
Delete NOP since WIN is just as good and keeps window size more consistent
in the face of lost packets.
-------

	Goals
	Non-goals
	Hardware Assumptions and Constraints

User appearance

	Connections
	Contact Names
	ITS implementation
	Lisp Machine implementation

Network Control Program

	Packets
	Hosts attached to more than one subnet
	Subnet and Host numbers
	Packet numbers
	Indices
	Routing
	Operations
	Table of packet field use vs opcode
	Flow and Error Control
	Dynamic Window-size Adjustment
	Uniquization
	Media Handlers
	Buffers
	ITS System Calls

Comparison with LCSNET
	Principle differences
	High priority data packets, interrupts, and flushing
	Data grams
	Multiple messages per packet
	Checksums in the packet
	Host-independent user-level protocols
	Very Small Hosts

Transmission Media

	Ethernet
	TEN11 Interface
	DL10 & DTE20
	Asynchronous line

Higher-Level Protocols

	Telnet
	File Access
	Mail
	Locate-named-service
>Goals

High speed  communication  between  processes  running  in  various  local
machines.  By "high speed", I mean much faster than the Arpanet.  At least
comparable to TU20 magtape (30000  characters/second), about 10 times  the
speed of the Arpanet.

No undetected errors in data transmission.

Not to depend on  a particular medium.  (However,  we are compromising  by
picking a fixed packet size.  The simplicity and efficiency are worth it.)

Simple enough to put in small pdp11's.  Also, simple to the user.

As much power as the Arpanet but, hopefully, a lot less hair.

Work well for both "telnet" and "file transfer."

The initial  implementation in  ITS should  have the  "in-system" part  as
small and simple  as possible.  This  includes no per-host  tables in  the
NCP, which  allows large  nets  and easy  reconfiguration.  There  is,  of
course, a host-name to address translation table used by "user"  programs.
The NCP has to have a per-subnet table which remembers where the bridge(s)
to that subnet from the local subnet are.


>Non-goals

Byte sizes other than 8 bits.   (pdp10 binary transmission should be  part
of a user-level file-transfer/ML-device protocol.)

Compatibility with the Arpanet.

Substituting for TEN11 interface functions such as running the AI TV11 and
XGP.

Automatic routing will be deferred.  Initially the routing tables will  be
assembled into the  programs.  A  host needs  one routing  table for  each
subnet  it  is  connected  to  (or  one  for  each  network  interface  it
possesses.)


>Hardware Assumptions and Constraints

Transmission is physically  in "packets" which  have headers, rather  than
in, e.g., continuous streams.

The prototype caiosnet (ether) interface  limits the physical length of  a
packet to  1025 bits  including  overhead bits.   The  net result  is  the
maximum number of data bytes in  any packet is 104.  This limitation  will
be extended to the  whole network (to keep  things simple).  However  some
provision will be made for a possible network-wide packet-size expansion.

All transmission  media will  be  assumed to  be highly-reliable  but  not
perfect; "perfect" reliability will be assured by having the two ends of a
connection use an  acknowledgement protocol which  detects lost  messages.
Transmission media  are required  to  lose any  messages that  they  don't
deliver intact.  (I.e. there must be hardware checksums.)

The acknowledgement protocol must be designed not to limit performance.

Statistical flow control ((see below.))
>User appearance

The network allows user processes in various machines to communicate  with
each other in various ways, for  instance, in imitation of a terminal,  or
in imitation of a disk file system.  These facilities are built on top  of
the basic capability  to send "packets"  (a header plus  some data in  the
form of 8-bit bytes) through the network.  The network undertakes never to
lose or  garble  any  packets,  except when  the  connection  is  cut  off
entirely.

This document defines the low-level, "in-system" part of the protocol.  On
top of this, special  programs (running in  user-mode) will implement  the
higher-level protocol that the general user program sees.  These protocols
and programs won't  be discussed  further in this  document, but  remember
that the strange packet  formats and so  forth are not  seen by most  user
programs.

>>Connections

When two  processes  wish  to communicate,  they  establish  a  connection
between them.  This connection allows two streams of packets to flow,  one
in each  direction.  [Explain  why  connections should  be  bi-directional
rather than uni-directional.  Basically that's  what you always want,  and
it makes things simpler.]

Connections are essentially  the only  facility provided  by the  network.
However, when first establishing  the connection it  is necessary for  the
two processes to contact  each other, and make  each other known to  their
respective operating systems.  In addition, it  is often the case (in  the
usual user-server  situation) that  one of  the processes  does not  exist
beforehand, but is to be created and made to run a specified program.

>>Contact Names

The way we choose to  implement contacting is to  say that one process  is
always a "user" and one process is always a "server".  The server has some
"contact name" to  which it  "listens".  The user  requests its  operating
system to connect it to a specified contact name at a specified host.   If
a process at  that host is  listening to  that contact name,  the two  are
connected.  If no  one is listening  to that contact  name, the  operating
system must  create a  server  process which  will  load itself  with  the
appropriate program and connect up.

Discovering which host to connect to to obtain a given service is an issue
for higher-level protocols.  It  will not be dealt  with at all  initially
(that is, there will  be a table  of host names and  numbers and the  user
will have to enter the name.)

Once the connection has  been established, there is  no more need for  the
contact name, and  it is  discarded.  Indeed,  often the  contact name  is
simply the name of a network protocol (such as "telnet") and several users
may want to have connections to that service at the same time, so  contact
names must be  "reusable."  (In the  other common case,  the contact  name
will be a "gensym".)

As far as the operating systems involved are concerned, contact names  are
simply arbitrary ascii strings defined  by user programs.  It is  expected
that the  various  higher-level  protocols will  define  standard  contact
names; for  instance, to  get the  telnet protocol  one would  connect  to
"telnet";  to  get  the  file  transfer  protocol  one  would  connect  to
"file-transfer".  If a machine receives a request to connect to a  contact
name which no  one is  currently listening to,  a server  process must  be
created and made  to execute  a program  which decides,  from the  contact
name, what  server program  to load  and execute,  or else  to refuse  the
request for connection.

Contact names have no relation to file names; they are simply a device for
introducing two processes to each other.  If one was using the network  to
transfer a file, one would first  contact the file transfer server at  the
appropriate host, then send a packet containing the name of the file to be
accessed.


>>ITS system calls

Ordinary user programs will not access the network directly; they will  go
indirectly through  a job-device  or  sty-type program  which will  use  a
higher-level protocol to make the network  look like what the user  wants,
the traditional things being a terminal and a disk file system.

Since these intermediate  user-mode programs  for using  the network  will
exist, there  is no  reason for  the interface  to the  low level  network
provided by the system to look at all like a standard device. Instead,  it
will be designed solely for simplicity and ease of implementation, and for
a certain degree of  efficiency.  This interface  will be described  after
the interface between Network Control Programs in different machines  (the
low-level protocol) is described.

At some future  time the intermediate  programs might get  moved into  the
system for  reasons of  efficiency,  but that  should  not be  allowed  to
complicate the initial implementation.

There will be a .INSRT-able file of routines similar to NETWRK.

>>Lisp Machine implementation

In the  case  of the  Lisp  Machine,  the only  distinction  between  user
programs and system programs is who maintains and documents them, and  how
carefully.  The code will be modularized in  about the same way as in  the
ITS implementation.
>Network Control Program

This is the part  of the operating system(s)  that implements the  network
(obviously).

>>Packets

The NCP's operate by  exchanging packets.  A packet  consists of a  header
containing control  information, and  zero or  more 8-bit  bytes of  data.
Hardware restrictions of  the prototype CAIOS  net interface restrict  the
maximum length of a packet to 61 16-bit words.  In fact, we will limit  it
to 60 words (to make packet buffers  in pdp10's be 32 words including  two
overhead words).  Again for the convenience of pdp10's, the header  should
be an even number of 16-bit words.

In this section the  packets will be  described as they  look to a  pdp11.
They look the same inside a Lisp Machine, since the byte structure is  the
same.   Inside  a  pdp10,  packets  are  stored  with  two  16-bit   words
left-adjusted in  each pdp10  word. Additionally,  the bytes  in the  data
portion of the  packet are swapped  so as  to put them  in pdp10  standard
order.  pdp11's  that  act  as  network interfaces  for  pdp10's  will  be
required to do this byte swapping  since they're likely to have more  time
available than the 10  to do it in,  and can also do  it faster, having  a
special instruction  for it.   pdp10's that  communicate directly  to  the
network will  have  hardware  assistance for  byte  reshuffling  in  their
interfaces.  See  the  transmission  media section  for  how  packets  are
encapsulated during transmission through the various media.

The header is 8 16-bit words and contains the following fields:

	-----------------------
	|opcode(8) | unused(8)|
	-----------------------
	|fc(4) |  nbytes(12)  |
	-----------------------
	|  destination host # |
	-----------------------
	|  destination index  |
	-----------------------
	|    source host #    |
	-----------------------
	|    source index     |
	-----------------------
	|      packet #       |
	-----------------------
	|    ack packet #     |
	-----------------------

opcode - tells the receiver  of the packet how  to interpret it.  See  the
Operations section below.  This is 8 bits long.  The 128 opcodes with high
order bit =0  are for NCP  use.  The 128  with high order  bit =1 are  for
user-program use.

unused 8 bits reserved for future use.

fc - forwarding count. 4 bits which count the number of times this  packet
has been forwarded by bridges.   Initially this field is always  generated
as zero.  Each bridge increments it; if it overflows, there is assumed  to
be a loop and the packet is discarded.

nbytes - the number of 8-bit bytes of data in the data part.  The  maximum
value of nbytes is 104.  The minimum is 0.  This is 12 bits long to  allow
for 4K-bit  packets.   (Actually 12  bits  is  enough for  up  to  32K-bit
packets.)

destination host # This is divided  into two 8-bit fields.  The high  byte
specifies which subnet.  The low byte specifies which host on that subnet,
and (on  ethernet  subnets) is  identical  to the  hardware  host  number.
Neither field may be zero in a valid host number.

destination index - index for this connection assigned by the  destination
host's NCP.

source host # - see destination host #.

source index -  index for this  connection assigned by  the source  host's
NCP.

packet # - an  ascending reference number used  in error and flow  control
(see below).

ack packet # - used in error and flow control (see below.)
>>Hosts attached to more than one subnet

(This also  applies to  hosts with  more than  one interface  to the  same
subnet, if there ever are any.)

Such hosts ought  to act as  bridges.  That  is, if a  packet is  received
which is not addressed  to this host,  it should be  sent back out,  using
this host's  routing  tables.  The  forwarding  count should  be  used  to
prevent infinite loops in the  event of inconsistent forwarding tables  in
two bridges.

It is undesirable  for a host  to have more  than one number.   So a  host
connected to  multiple subnets  should choose  one subnet  as its  "home",
which is the address which is  advertised as that host.  The host's  other
network connections  are  un-named bridges.   In  some causes  it  may  be
preferable not to pick a "home" subnet; instead, one invents a new private
subnet which only  has that one  host on  it, and all  the host's  network
connections act as bridges to that subnet (also to each other).

The routing should be set up so that packets destined for such a host from
a subnet on which it has an interface choose that interface as the  bridge
to that host, so that in fact data  flows the same way as if the host  had
more than one  number and  the host-name to  host-number lookup  magically
chose the right number.


>>Subnet and host numbers.

Each  machine  has  an  identifying  number.   Since  these  are   totally
arbitrary, I will assign some right now to avoid any confusion.

	Subnet numbers
	0  not used.
	1  9th floor subnet
	2  subnet that bridges between bldg 38 and 9th floor

	Host numbers.  These are probably wrong, since the
	host number will be the same as the physical ethernet
	address, which depends on the relative position 
	on the cable.
	0  not used.
	1  AI
	2  ML
	3  DM
	4  MC
	5  Micro automation
	6  LOGO
	7  Plasma physics
	10 Chess machine
	11 Lisp machine #1
	12 Lisp machine #2
>>Packet numbers

Each time the  sending user  puts another  packet into  the network,  this
number is increased by  one.  (These numbers are  independent for the  two
directions of a connection.)  The receiver  uses these numbers to get  the
packets  into  order  and  ensure  that  there  are  no  duplications  nor
omissions.  The packet numbers  are 16 bits and  wrap around to zero  when
they overflow.  When the connection is first opened, an initial value  for
the packet# is established.  If  it was 0, then  the packet# of the  first
data packet would be 1.

The receiver returns packet  numbers to the sender  in the "ack packet  #"
field of packets going in the  opposite direction on the same  connection.
The number in this field is  the number of the latest packet  successfully
received by the receiver.  The sender need not make any further attempt to
send packets numbered less or equal to this.  More on this below.

Packet #'s should be compared modulo 2**16.  On pdp11's, use

	CMP A,B
	BMI <A is less>		(BMI rather than BLT or BLO)

On pdp10's, use

	SUB A,B
	TRNE A,100000		(rather than CAMGE A,B)
	 JRST <A is less>

On Lisp machines, use

	(AND (BITTEST 100000 (- A B))
	     <A is less>)	[rather than (AND (< A B) ...)]
>>Indices

Each connection has  two indices assigned  to it, one  at each end.   Each
index is  an arbitrary  16-bit number  assigned  by the  NCP at  its  end;
usually it is an index into that NCP's tables.  Indices are required to be
non-zero.  For maximum simplicity, all packets include both indices.   The
receiver of a packet uses  the destination index to  find out who to  give
the packet  to.   Generally  the  source index  is  used  only  for  error
checking, but when a connection is first opened the source index has to be
saved and used as the destination  index in future packets in the  reverse
direction.

To prevent packets somehow left over from old connections from interfering
with new  connections,  we require  that  a certain  time  elapse  between
successive connections between the same pair of hosts with the same  index
numbers at each end, this time to be longer than the maximum time a packet
is reasonably expected to sit around in the network somewhere (the maximum
transit time through all bridges,  etc.)  This requirement is  implemented
by making part of the index number be a "uniquizer"; when a host reuses  a
slot in its tables, it increments the uniquizer, so that the index  number
for that slot is not the same as it was previously.  Then if the uniquizer
field is  sufficiently  wide, and  the  rate of  creation  of  connections
(actually the  rate of  allocation of  indices) is  sufficiently low,  the
requirement will  be satisfied.   For the  kind of  network we're  talking
about, the time is several seconds, and  the uniquizer need only be a  few
bits wide.   It is  up  to each  host how  wide  it makes  the  uniquizer,
depending on how big it wants its tables to be.  It is best if table slots
are also allocated circularly, so the same one is not reused immediately.

A user process's "capability" or "channel" to a connection, used by it  to
ask the NCP to operate on that connection, simply contains the appropriate
index.

Associated with each index the NCP has  a "state", the host # and index  #
of the  other end  of the  connection, some  read buffers  and  associated
variables, including  a  current packet  #,  and some  write  buffers  and
associated variables, again including a current packet #.

The "state"  can be  Closed  (no connection  or other  activity  currently
associated with this index), Open (this index has a connection to  another
index at  another  machine), RFC-sent  (requested  another machine  for  a
connection, but  no  answer yet),  Listen  (listening for  a  request  for
connection  to  a  certain   contact  name),  Broken  (connection   closed
abnormally by network or machine lossage), and RFC-received (waiting for a
server process to get going and pick up a request for connection that came
in).


>>Routing

This section is a place-holder.  Initially routing will be kludged with  a
fixed table.  Once the network is set up automatic routing will be put in.

The routing decision consists of  looking at the destination subnet  field
of a packet and  deciding which interface (on  a multi-interface host)  to
transmit it on, i.e. what is the best route to that subnet.  In  addition,
if the  destination  is  not on  a  subnet  to which  there  is  a  direct
interface, one must  determine what is  the host number  of the bridge  it
should be sent to.

It also involves  recognizing packets  which are destined  to the  current
host and receiving them.
>>Operations

This section tells what the  values of the opcode  field in a packet  are,
and how an NCP responds to each one.

1 RFC - Request for Connection

This message is sent from  user to server in  order to open a  connection.
The data contains  the contact  name.  Actually,  if the  data contains  a
space, the  characters before  the  space are  the  contact name  and  the
characters after the space are "arguments".   This is done so that  simple
transactions can work (see below).

The destination index is zero, because it is not known yet.  The responses
are:

OPN, if a server  process is found  or created that  wishes to accept  the
request and open up a connection.

CLS, if the connection cannot be opened.  The data field contains an ascii
explanation of why not.

ANS, if a simple transaction is  occuring.  The data contains the  answer,
and no connection is ever created.

FWD, if there  is no server  process at this  host, but there  might be  a
suitable one some place else.

There may also be no response, if the RFC was lost in the network, or  the
destination host  is down,  or the  reply was  lost in  the network.   The
network software guarantees  reliable transmission once  a connection  has
been established, but  not reliable transmission  of the control  messages
that initially establish a connection, so appropriate time-outs must exist
(perhaps only in the user process that asks the NCP to issue an RFC).

The packet # field contains  the first packet #  that will be assigned  to
data transmitted from the  user process, minus one  modulo 2**16.  In  the
simplest case, this can be zero, and the first packet sent will be  packet
# 1.  One  might also imagine  uniquizing the packet  numbers as an  extra
error check, but  this should not  be necessary, because  the indices  are
uniquized, and connections must be explicitly agreed to by each end before
data can flow.

The ack packet # field contains  the initial window size for packets  sent
to the sender of the  RFC (see below).  This  field is used because  there
can be no acknowledgement going on, since there is no connection yet.

2 OPN - Connection Open

This is  the positive  acknowledgement  to RFC.   The source  index  field
conveys the acknowledger's connection index to the requester.  The  packet
# field  contains  the  first packet  #  that  will be  assigned  to  data
transmitted from the server process, minus one modulo 2**16.

The ack packet # field contains  the initial window size for packets  sent
to the sender of the  OPN (see below).  This  field is used because  there
can be no acknowledgement going on, since there is no connection yet.


3 CLS - Connection Closed

CLS is the  negative response  to RFC.  It  indicates that  no server  was
listening to the contact  name, and one couldn't  be created, or for  some
reason  the  server  didn't  feel  like  accepting  this  request  for   a
connection, or the destination NCP  was unable to complete the  connection
(e.g. connection table full.)   The destination index  will be the  source
index of the RFC.  The source index  will be zero because the NCP did  not
put this connection into  its tables.  The data  bytes, if there are  any,
contain an ascii explanation.

CLS is also used to close a connection after it has been open for a while.
In the Arpanet, the  NCP undertakes not to  close the connection when  the
user requests it, but waits until  all data transfer has completed.   This
is a source of extra complexity, since data transfer may be hung up, there
have to be  timeouts, there have  to be connections  waiting to be  closed
which aren't owned by any  user, etc.  It seems  simpler to make CLS  take
effect immediately, and let the  user processes assure that data  transfer
has been completed.  Note that telnet-like applications don't need it, and
ftp-like applications have to have it separately from closing anyway.

4 FWD - forward a request for connection

This is a response to RFC which indicates that the desired service is  not
available  at  the  process   contacted,  but  may   be  available  at   a
possibly-different contact name  at a possibly-different  host.  The  data
field contains the new  contact name and the  ack packet # field  contains
the new host number.  The  issuer of the RFC  should issue another RFC  to
that address.

5 ANS - answer to a simple transaction

Simple transactions are  transactions which consist  of a single  question
(request) and a single answer (response).   They have no side effects,  so
it is  not necessary  for either  party to  know whether  the other  party
thinks the transaction was completed or not.  Simple transactions need  no
flow control and no error control, other  than a timeout by the user  side
to detect  lost messages.   They are  a simple,  efficient way  for  doing
simple-minded things such as extracting information (such as the time or a
host name table) from a central server.

A simple transaction  is initiated  by sending an  RFC to  a server  which
happens to use simple transactions rather than full connections.  The data
field of the RFC  consists of the contact  name of the server,  optionally
followed by a space and arguments to the request.  The server responds  by
sending an ANS packet,  whose data field is  the answer.  The  destination
address of the ANS comes from the  source address of the RFC.  The  source
address, packet #, and  ack packet #  fields of the ANS  are not used  and
should be zero.

The difference bwteen simple transactions  (2 packets) and full  datagrams
(4 packets) is that in the  simple transaction the two parties don't  have
to agree about whether or not the transaction in fact took place, while in
the full datagram they do, so acknowledgement is necessary.

200-377 DATA - Transmits Data

The data portion of the packet is data being sent through the  connection.
The packet # is a number that increments by one for each data packet  sent
in this direction on this connection.  This is used to detect lost packets
(which includes packets garbled  in transmission and  packets lost in  the
statistical flow control scheme) and duplicated packets (caused by lost or
delayed acknowledges.  The NCP  undertakes to deliver  the packets to  the
destination process  in the  same order  that they  came from  the  source
process, with no duplications and no omissions.  Note that any opcode with
the sign bit on is a data packet  as far as the NCP is concerned; if  they
wish, higher-level protocols may  use the opcode  field to define  various
different kinds  of data  packets.  Thus,  what is  herein called  a  data
packet may be a "control" packet to a higher-level protocol.

6 NOP - no-operation

This type of  packet does nothing  special to itself.   However, like  all
packets it contains an  "acknowledge packet number"  field.  The main  use
for NOP packets, then,  is as a vehicle  to convey acknowledgements.   The
packet#  field  of  NOP  is  not  used  (i.e.   NOPs  are  not  themselves
acknowledged.)

7 WIN - set window size

The packet#  field  contains the  window  size desired  for  packets  sent
through this connection to the sender of this packet, i.e. in the  reverse
direction from  the  direction  the  WIN  was  sent.   Since  this  is  an
unacknowledged packet, the packet#  field can be  usurped from its  normal
use in this way.

The window size is the  maximum number of outstanding unacknowledged  data
packets which the sender may  have at any one  time.  If the sending  user
process tries to transmit additional  data packets on this connection,  it
should be made to wait until some packets have been acknowledged.

The intention of the window size  is to regulate how often  "acknowledges"
must be returned.  It's the same as in DSP and TCP.  This is explained  in
more detail below.

No sender is actually  required to pay any  attention to the window  size.
No receiver  is actually  required to  set the  window size  to  something
reasonable.  However, those hosts that want to maximize performance should
do something about the window size.  The size is initially set during  the
RFC/OPN dialogue, presumably according to the type of protocol being used.
An NCP may, if it  chooses, use the WIN  packet to dynamically adjust  the
window  size  according  to  observed  network  behavior.   (See   dynamic
window-size adjustment section below.)

10 WTS - window too small

The sender sends a  WTS packet if  it finds its  buffer full, a  situation
which indicates that either the window size is too small or the sender  is
faster than the  receiver.  See  the flow  control section.   The rate  of
transmission of  WTS  packets  has  to  be  strongly  limited,  since  the
condition which causes them  will not go  away right away  and may not  go
away at all if the receiver is slower than the sender.

11 LOS - you are losing

If a host receives a  packet for a connection  that does not exist  (other
than RFC which isn't associated with a particular connection, LOS, and CLS
which is safe to  ignore), it should  interchange source and  destination,
change the opcode to  LOS, insert an ascii  explanation of the problem  in
the data field, and send the packet  back.  A host receiving a LOS  should
break the connection  specified by  the destination index  and inform  the
associated process that something has gone wrong.  It should make the  LOS
packet available to that process so the explanation of the problem can  be
read.

The  LOS  packet  isn't   actually  necessary,  since   if  a  packet   is
re-transmitted many times without being acknowledged, the NCP should  give
up, close  the connection,  and  inform the  user  that the  foreign  host
appears to be dead.

For debugging, an echo feature is  implemented as follows.  If you send  a
packet with a  data opcode  and source and  destination indices  of 0,  it
should be sent back as a LOS packet.  The data field will be destroyed  by
the error explanation, but  the packet #  and ack packet  # fields can  be
used to remember any sequencing or checking information.

12 LSN - listen (never transmitted through the net, see below)
>>Table of packet field use vs opcode

The unused field  is never  used and must  be zero,  the forwarding  count
field is always used in the same  way, and the nbytes field is always  the
length of the data.   Fields marked "0" below  are don't care rather  than
must be zero, but zero is certainly recommended.

	 Destination	   Source
Opcode	Host	Index	Host	Index	 Packet#     Ack pk#	Data
------	----	-----	----	-----	 -------     -------	----
 RFC	usual	  0	usual	usual	first - 1  window size	contact name

 OPN	usual	usual	usual	usual	first - 1  window size	   0

 CLS	usual	usual	usual	  0	    0	        0	reason

 ANS	usual	usual	usual	  0	    0	        0	answer

 FWD	usual	usual	usual	  0	    0		0	contact name

 NOP	usual	usual	usual	usual	    0	      usual	   0

 WIN	usual	usual	usual	usual	window size   usual	   0

 WTS	usual	usual	usual	usual	    0	      usual	   0

 LOS	src h	src i	dst h	dst i	   pk#	    ack pk #	reason

 LSN	  0	  0	  0	  0	    0		0	contact name

Data	usual	usual	usual	usual	  usual	      usual	data
>>Flow and Error Control

The NCPs conspire to ensure that data  packets are sent from user to  user
with no duplications, omissions, or changes of order.

Each receiver  (each end  of each  connection is  a receiver,  and also  a
sender; think of receivers  and senders as little  actors inside the  NCP)
has a  list of  buffers containing  packets which  have been  successfully
received and are waiting to be read  by the user process, and two  packet#
variables.  One is the number of the last packet read by the user process.
The other is the  number of the last  packet which has been  acknowledged.
If these two are not equal, the receiver needs to send an  acknowledgement
"soon."

The received-packet list needs to be kept sorted by packet number, and the
NCP has to  make sure that  the user  process does not  see duplicates  or
omissions.  If packets  arrive out  of order, the  NCP has  to sort  them.
This means that the  user process may  not be able to  read a packet  even
though the receive  list is  non-empty, because  the first  packet in  the
receive list is  not the successor  of the  last packet read  by the  user
process.

A good way  to do this  is to have  2 lists of  received packets, each  of
which is sorted.   The first list  contains those packets  which the  user
process may read;  the first  in the  list is  the successor  to the  last
packet read, and there are  no gaps in the list.   The second list is  the
remaining packets, which  the user may  not read until  some other  packet
arrives to fill in the gap.  Each  list needs a pointer to head and  tail,
and the packet number of the next packet to be appended to the tail.

It is not actually a requirement that an NCP support out-of-order packets,
rather  than  simply  discarding  them  and  hoping  that  they  will   be
retransmitted in order, but if it's not very hard to do one might as  well
do it.

Acknowledgements are sent by putting  the last-received variable into  the
"ack packet #" field  of an outgoing packet  on the opposite direction  of
the appropriate connection,  and copying the  last-received variable  into
the last-acknowledged variable.  Where does the outgoing packet come from?
First of all, all outgoing data,  NOP, WIN, and WTS packets  automatically
carry acknowledgement for the reverse  direction of their connection.   So
if an outgoing packet happens to be sent at a time when an acknowledgement
is necssary, that takes care of it.

Secondly, if the number of outstanding unacknowledged packets is more than
1/2 the window  size, a NOP  should be generated  and sent immediately  to
acknowledge those packets before the sender fills up the window.

Thirdly, the "soon" of four paragraphs back is implemented by a timeout in
the NCP.  If an acknowledgement remains  required for a certain amount  of
time, a NOP  should be generated  and sent to  carry it.  The  appropriate
time interval is 1 second, I would  guess.  This timeout does not have  to
be too exact, however.

The reason  for  having  a  timeout here,  rather  than  just  sending  an
acknowledgement  right  away,  is  two-fold.   It  allows  "batching"   of
acknowledgements, where a single  packet can be  used to acknowledge  many
packets, which halves the  network traffic caused  by bulk data  transfer.
It also allows "piggy-backing" of acknowledgements on data packets,  which
(for instance)  decreases the  network  traffic caused  by  remote-echoing
telnet connections.

There is  also something  called  a "null  acknowledgement", which  is  an
acknowledgement which  does  not  acknowledge any  packets  which  weren't
acknowledged already, carried on a NOP.  This sounds useless, but actually
it serves a purpose.  A sender  decides that the destination host is  down
if it  has  been  waiting for  an  acknowledgement  for a  long  time  (30
seconds?) and hasn't received one.  If  the receiving host is up, but  the
receiver process  is stopped  or not  asking for  any more  network  input
(perhaps it is being interactively debugged) the sender should not  decide
it is down.  So a  receiving NCP needs to recognize  this case and send  a
null acknowledgement.   If  the  received-packet  list  for  a  connection
remains non-empty for a  certain time (10 seconds?),  the NCP should  make
the   connection   require    an   acknowledgement    by   changing    the
last-acknowledged  variable  to  be  one  less  than  (module  2↑16)   the
last-received variable, i.e. should send a null acknowledgement.

When a receiver receives a data packet,  it compares the packet # of  that
packet with the last-received  variable.  If it is  less or equal  (modulo
2↑16), it is a duplicate of a packet already given to the user, and should
be discarded (or it is at least 30000 packets early, which is unlikely.)

If it  is greater,  it is  sorted  into the  received-packet list  at  the
appropriate position (if it  has the same packet#  as a packet already  in
that list, it is a duplicate and is discarded.)  It is NOT acknowledged at
this time; no packet is ever acknowledged  until it has been given to  the
user process  ("end  to end  acknowledgement").   Since a  packet  on  the
received packet  list has  not yet  been acknowledged,  it may  be  safely
discarded at any time  if the operating system  runs out of buffer  space.
Also, if the receiving user process is  not listening to the net, the  NCP
cannot be swamped  with arbitrary  numbers of packets,  since the  sending
user process is not  supposed to send more  than window-size packets  past
the last one the receiving user process read.

Note that  the  most  likely  cause  of  packet  duplication  is  that  an
acknowledge was lost,  so whenever  a duplicate packet  is discarded,  the
last-acknowledged  variable  should   be  set   to  one   less  than   the
last-received variable so that a new acknowledge will be sent soon.

The sender has a list  of packets which have been  entrusted to it by  the
user for transmission and  one packet # variable,  the number of the  last
packet sent by the user.   When the user next  sends a packet, the  sender
will increment this variable and set the packet# of the sent packet to the
result.  The sender also sets the source and destination host numbers  and
indices of  the packet,  sets  the "ack  packet  #" to  the  last-received
variable  of  its  corresponding  receiver,  sets  the  receiver's   last-
acknowledged variable to that, chooses  a transmission medium by  checking
the routine tables, and  gives the packet to  the transmission medium  for
"immediate" transmission (perhaps it has to wait its turn in a queue.)  It
also saves the packet on a list, in case retransmission is required.

With each buffered packet the sender holds in trust, it remembers the time
that packet  was last  transmitted.  From  time to  time  "retransmission"
occurs.  The  sender  gives one  or  more packets  from  its list  to  the
transmission medium.  It  always starts  with the  oldest, so  as to  keep
things in order, and sends the rest in order until it gets to one that was
transmitted too recently to do  again.  Retransmission is used to  recover
from lost  or  damaged  packets, lost  or  damaged  acknowledgements,  and
packets discarded by the receiver due to lack of buffering capacity.

Each time a receiver receives a packet,  it gives the "ack packet #"  from
that packet to its corresponding sender.  The sender discards any  packets
with numbers less than  or equal to that,  since their successful  receipt
has just been acknowledged.

A transmitter keeps re-transmitting  packets until they get  acknowledged.
Since the  acknowledge  is from  the  ultimate receiver  to  the  original
sender, packets  will eventually  get  sent correctly  even if  some  link
somewhere in  the network  occasionally loses  packets.  This  end to  end
acknowledgement also allows flow  control to be  done by ignoring  packets
for which the receiver does not  have buffer space, which is  considerably
simpler than "allocation" schemes, and allows a higher rate of through-put
[probably].

There are no  negative acknowledgements in  this scheme.  If  a packet  is
garbled, how do  you know  who to  send the  negative acknowledgement  to?
(But see the LOS packet.)
>>Dynamic Window-size Adjustment

Permit me to stress that this stuff is optional for small NCPs.

The goals of flow control are:

1. Error recovery.

2. If the receiver is faster than the sender, avoid unnecessary delays  in
transmission due to having  to wait for an  acknowledge or having to  wait
for the sender process to wake up.

3. If the sender is faster than the receiver, minimize retransmissions due
to receive buffer overflow.

4. Minimize the number of otherwise-useless packets generated to carry  an
acknowledgement  or  a  window-size  negotiation,  and  minimize   useless
retransmissions.

Consequences of the goals:

1. All packets will be retransmitted until acknowledged.

2. The  sending NCP  must  buffer several  packets,  and packets  must  be
acknowledged in groups, not one-by-one.

3. If the receiver is  slow, something must force  the sender not to  send
packets too fast.

4. The interval between retransmissions should  not be too small.  It  may
be desirable for it to increase if the receiving process is not  listening
for some reason.

The window size  is the  maximum number  of packets  which may  be in  the
network at  one time  (for  one direction  of  one connection).   "In  the
network" means output  by the  sending process and  not yet  input by  the
receiving process.  (These processes are the entities which determine  the
rate, unless they are so fast that the network slows them down.)

The window size is not the number  of packets acknowledged at a time;  for
best operation the latter must be 1/2 to 1/3 of the former.  See below.

If the  sending process  is slow  (and determines  the rate),  things  are
relatively simple.  We  just have  to have a  big enough  window size  and
frequent enough  acknowledgement  to  cover  for  sending  process  wakeup
delays.

If things are not limited by the sender, then
		      Window size
	Flow rate = ---------------
		    Round trip time

	Round trip time = time to wake up sender process (multiplied
				by the fraction of the time this
				is necessary)
			+ time packet is sitting in sender buffers
				before it is transmitted
			+ transit time through the net
			+ time packet is sitting in receiver buffers
				before it is read; this is the maximum
				of time to process previous packets
				and time to wakeup sender process
			+ time until acknowledge is generated and sent
			+ transit time through the net

The round trip time  is the time  between when packet N  is output by  the
sending process  and  when  it is  acknowledged,  permitting  the  sending
process to output packet N+WS.

The main variable components of the  round trip time are the delay  before
acknowledgement and the delay waiting  in the receiver buffer for  packets
to be processed.  If these were zero, the round trip time would consist of
two process wakeups and two network transit times (determined by the delay
waiting for the cable and waiting  for previous packets from this host  to
be transmitted, the time  needed to load and  unload the interface in  the
buffer, and  the actual  transmission time,  multiplied by  the number  of
bridges in the path.)

This ideal round trip  time is probably  on the order  of 2 seconds.   The
timeout for retransmission  should be 2  to 3 times  the round trip  time.
The timeout for acknowledgement should be 1/3 to 1/2 the round trip  time.
One could either measure the actual round trip time, or use an estimate of
say 3 seconds, a little higher than the ideal.  It would be a good idea to
measure the round  trip time in  any case,  which is done  by getting  the
elapsed time since  transmission when  a packet  is discarded  due to  its
being acknowledged, and averaging that.

The receiver process should initially set  the window size to the  maximum
flow rate it wants to handle times the desirable round trip time.

Symptoms of improper window size:

If the window-size is too large, the  round trip time becomes long due  to
packet processing  delay in  the  receiver buffer.   (There will  be  many
packets in the receiver buffer, and  the receiver will be processing  them
slowly.)  The long round-trip time will cause unnecessary retransmissions.
Retransmissions could  also be  caused by  the NCP's  discarding  received
packets due to insufficient core to buffer them.

If the window-size is too small, excessive process blocking and waking  up
occurs.  The receiver process  often empties its buffer  and has to  block
until more packets arrive.  The sender  process often fills up its  buffer
and has to block until some  of the buffered packets are acknowledged.   A
small window size  also causes acknowledgements  to have to  be sent  more
frequently than necessary.  Note that from the receiver's point of view it
is impossible to distinguish between the  window size being too small  and
the sending process being too slow.

Here is a scheme for dynamic adjustment of the window size:

Note that  window-size adjustments  cannot take  effect (in  the sense  of
fixing the symptoms) immediately, so it is necessary to limit the rate  at
which the window size is adjusted.

When the  receiver receives  (and discards)  a duplicate  of a  packet  it
already has in its buffer,  this indicates either that an  acknowledgement
was lost or that the window size is too large.  Since packets are  assumed
not be lost very often, we may as well assume the window size is too large
and send a  WIN packet to  decrease it.  Another  possibility would be  to
have the  sender detect  the  long round-trip  time  and complain  to  the
receiver, who  could  adjust  the  window size.   The  receiver  must  not
decrease the window size again  until all packets currently buffered  have
been read and acknowledged, indicating that the sender has had a chance to
decrease the number of packets buffered  at its end.  A reasonable  amount
to decrease the window size by is 1/3 of its current value.

When the  sending process  wants to  output a  packet, but  the number  of
packets already buffered is greater than  or equal to the window size,  it
should send a WTS, indicating that the problem is too small a window  size
or too slow a receiver rather than too slow a sender.  When the  receiving
process wants to input a  packet, but the buffer is  empty, and a flag  is
set indicating that a WTS has been  received, it should send a WIN  packet
adjusting the window size  upward by 1/2 of  its current value (and  clear
the WTS-received flag).   This is  rate-limited by  preventing the  sender
from sending a second WTS until all  the packets buffered at the time  the
first WTS was sent  have been acknowledged,  indicating that the  receiver
has had time to act on the first WTS.

The variables  required.  For  both  the sending  and receiving  sides,  a
packet number which has to be acknowledged  before WTS or WIN can be  sent
again, and  a flag  saying  whether this  number  is operative.   Also,  a
WTS-received flag in the receiver.

It is important to  meter the performance of  this mechanism and find  out
whether it does anything and whether what it does is right.

Consider the  possibilities of  changing this  into a  more symmetric  and
negotiation-based scheme, where  the sender always  initiates window  size
changing and the receiver either agrees or ignores the request.   Consider
using elapsed time as  an additional rate-limiter (have  to use the  other
thing, too, so idle connections don't keep changing window size; this  may
be deleteable if it is always sender-initiated.)

More notes on the subject of window-size too small.  This is identical  to
receiver too slow.  The net  flow rate out of the  sender is trying to  be
higher than that into the receiver, so packets pile up in buffers at  each
end.  The round-trip becomes arbitrarily high to preserve the equation and
divide window size down enough to get the flow rate.

The situation  where  the window-size  is  too small  and  we want  to  do
something about  it has  to be  distinguished from  two other  situations.
One, the  receiver is  accepting packets  slowly but  the sender  is  also
sending them slowly.  We don't want to change the window-size, because  it
doesn't matter since packets aren't piling up, and at any time they  might
both decide to go fast.   Two, the receiver's net  flow rate is high,  but
its response  time is  long (it's  taking packets  in bursts).   Here  the
round-trip time is still  long, but making the  window size smaller  would
make things worse.

The symptoms  that  distinguish  the  case  where  we  want  to  make  the
window-size smaller are:  the round-trip  time is long, the sender  buffer
is full,  and  the number  of  packets acknowledged  at  a time  is  small
compared to the window size.  Actually, the last two are sufficient, since
if the  acknowledgement batch  size is  small, and  we know  it's not  the
sender's fault, may as well decrease the window size to save buffer  space
and decrease the round-trip time.
>>Uniquization

To avoid  problems with  packets left  over from  old connections  causing
problems with new connections,  we do two things.   First of all,  packets
are not accepted  as input  unless the  source and  destination hosts  and
indices correspond  to  a known,  existent  connection.  By  itself,  this
should be  adequate, provided  that  retransmission is  only done  by  the
originating host, not by intervening gateways and bridges in the  network.
This is because we  can safely assume  that when a host  agrees to open  a
connection with a certain index number at its end, it will give up on  any
previous connection with the same index, therefore it won't retransmit any
old packets with that index  once it has sent out  a new RFC or OPN.   The
indications are  that  our network  will  be "local"  enough  that  indeed
retransmission will only be done by the original host.

Problems could still occur  if packets get  out of order,  so that an  OPN
establishing a  new connection  gets ahead  of a  data packet  for an  old
connection with the same index.  To protect against this, it is  necessary
to assure that at  least a few  seconds elapse before  an index number  is
reused.  This could be  done either by remembering  when an index is  last
used, or by reserving  part of the 16-bit  index number as a  uniquization
field, which  is  incremented each  time  an otherwise-the-same  index  is
reused.  This field needs to big enough to cover for the maximum delay  of
an old  data packet  with  the same  index, and  depends  on the  rate  of
creation of connections.  Which method is  chosen is at the discretion  of
each local  NCP.   Another necessary  assumption  is that  when  a  system
crashes and is reloaded (thus forgetting any remembered information  about
which indices were in use when and so forth) that the time to reload it is
more than a few seconds.

Problems could occur not only with  left over data packets, but also  with
left over control packets.  This isn't too much of a problem since control
packets are not retransmitted, but it could still happen that a host  gets
faked out into thinking that it has a connection to another host that  the
other host doesn't know about.  In this case, it should just look like the
connection was opened  and then  either the other  host went  down or  the
connection was broken by a LOS packet, since the other host won't generate
any data packets and won't accept any.
>>Media handlers

A host may  be connected  to more than  one transmission  medium.  It  has
service programs for each.

When a  packet  is  received that  is  not  addressed to  this  host,  the
forwarding count  should  be incremented.   If  it doesn't  overflow,  the
packet should be sent back out according to the routing tables,  otherwise
it should be discarded.  Normally it would not be useful to send a  packet
back out on the  same subnet it  came in on,  but we may  as well let  the
forwarding count catch this along with other loops.

When a packet is received, if the opcode is RFC, it is handled  specially.
The contact name is compared against those of all the indices which are in
the Listening state.   If a match  is found,  that index is  put into  the
RFC-received state, its LSN packet is discarded, and the RFC packet is put
into its input list so that the  server process can see it.  If no  server
is listening  to  that contact  name,  the RFC  packet  is placed  on  the
pending-RFC list, and  (in the case  of ITS) a  server process is  created
which will  load  itself with  a  suitable program  to  open an  index  in
"server" mode, gobble an RFC packet, look at the contact name, and  either
reload itself with the appropriate server program or send a CLS reply.

When a non-RFC  packet is received,  the system must  look for a  receiver
index to handle  it.  If  none is  found, or the  state is  wrong, or  the
source host  and  index don't  match,  a LOS  should  be sent  unless  the
received packet was a LOS.  Otherwise, if the received packet is WIN, WTS,
or NOP, it  is processed and  discarded.  Other packets  are given to  the
user; OPN, CLS, and  LOS cause a  state change but are  also given to  the
user as input.

The transmitting side  of a  transmission medium  handler has  a queue  of
packets to be transmitted.  It should send them out, in order, as fast  as
possible, except that if a receiving  host has no buffer space (which  can
be detected because  its chaosnet interface  will cause "interference"  on
the ether), it should look down the list for another host to send to.   As
long as packets to the  same host are sent in  the order they are  queued,
everything will be all right.   (Actually, this normally shouldn't  matter
much.) In addition, when the packets are put into the transmit queue,  the
destination host number has to be looked up in a table to determine  which
transmission medium to use to get to  it and (in the case of ether)  which
physical host number to put in the packet trailer for the hardware.
>>Buffers

In ITS, the buffering scheme will be as follows.  There will be a pool  of
32-word packet buffers  available.  When it  runs out, more  can be  made.
When there are many  free some can be  flushed.  32-word buffers are  made
out of 1024-word  pages (adding a  new page type),  rather than using  the
existing 128-word buffer mechanism, because  there is a limited number  of
128-word buffers, and that  mechanism is a little  painful to use.   There
are likely to be several hundred packet  buffers (say 12K of core) in  use
when high-speed (e.g. mag-tape speed) file transfer is going on.

Each packet buffer has a  two-word header, and 30  words which can hold  a
packet.  Packet buffers  can be on  one (or sometimes  two) of six  lists:
The free list.  The receive list, of  which there are two for each  index,
one for packets which  the user may safely  read and one for  out-of-order
packets which are  awaiting the arrival  of an earlier  packet before  the
user may read them.  The send list, of which there is one for each  index.
The transmission list.  The pending-RFC list.  The pending-LSN list.

The free list contains packet buffers  which are free.  They are  threaded
together through addresses in the first word.  Zero ends the list.

The transmission list contains packets which are to be transmitted out  on
the network "immediately".  At interrupt  level packets are pulled off  of
this list and sent.  (There might be more than one transmission list if  a
machine is connected to more than one physical medium.)  The  transmission
list is threaded  through addresses in  the left half  of the first  word.
Zero ends the list.  After transmission -1 is stored to indicate that  the
packet is not on the transmission list any more.  If the right half of the
first word is -1, indicating that the  packet is not also on a send  list,
it is returned to free.

Each send list  contains packets  for a particular  connection which  have
been entrusted to the system by the user to be sent, but have not yet been
acknowledged.  They are threaded  together through the  right half of  the
first word.  The second  word contains the time  that the packet was  last
transmitted (actually, the time that it  was last put on the  transmission
list.)

Each  receive  list  contains  packets  which  have  been  received  on  a
particular connection and  not yet read  by the user.   They are  threaded
together by addresses in the first word, and the list ends with zero.  The
receive lists must be kept sorted by packet number.

The pending-RFC list  contains request-for-connection  packets which  have
not yet  been either  accepted or  rejected.  They  are threaded  together
through the first word.   When a server  process finishes getting  created
and loaded, it will take an RFC off the pending-RFC list and put it on its
own receive list.   The second  word of  these packets  contains the  time
received so that  the system can  know when something  has gone wrong  and
they should be thrown away.

The pending-LSN list  contains LSN  packets for all  the listening  users.
These packets are  just used as  a handy  place to save  the contact  name
being listened to.   They are  threaded together through  the first  word.
The source-index field  in the packet  header can, of  course, be used  to
find which user this packet belongs to.
>>ITS System Calls

(Other systems  would have  similar calls,  with appropriate  changes  for
their own ways of doing things.)

OPEN

	Not allowed.  (I said this wasn't a "standard" device!)
	Instead use:

CHAOSO

	arg 1 - receive channel number
	arg 2 - transmit channel number

	First, the two specified channels are closed.  Then an index
	is assigned to the user and the two channels are set up to
	point to it.  Two channels are used since in general ITS
	channels are unidirectional, and to allow to the user to
	handle receive and transmit interrupts differently.

	The created index is placed in the Closed state.  To set up
	a connection, IOT an RFC or LSN packet down the transmit
	channel.


IOT

	Always transfers exactly one packet.  The effective
	address of .IOT is the address of a 30.-word block
	which contains the packet.  .CALL IOT should be given
	an immediate second argument which is the address of
	the 30.-word block.  .CALL SIOT is not allowed.

	The format of the 30.-word block is:
			 16		  16	      4
		-----------------------------------------
		| opcode | unused | fc |   nbytes   | 0 |
		-----------------------------------------
		|destination host |destination index| 0 |
		-----------------------------------------
		|   source host   |   source index  | 0 |
		-----------------------------------------
		|    packet #     |  ack packet #   | 0 |
		-----------------------------------------
		| data1  |  data2  ...
	
		                            ... data104 |
		-----------------------------------------

	In the descriptions below, if an error is said to
	occur that means IOC error 10. (channel in illegal mode)
	is signalled.

	In the case of an output IOT, the user sets only
	the opcode, nbytes, and data-n fields.  When the
	NCP copies the packet into a buffer in the system
	it sets the other fields of the header to the
	appropriate values.

	This is not completely true.  When outputting an RFC,
	the user sets the destination host field, and sets the
	ack packet # to the receive window size desired.  The user
	also sets the window size when outputting an OPN.

	The NCP checks for the following special values
	in the opcode field of a packet output by the user:

	RFC - error if the index is not in the Closed state.
	      The packet is transmitted (but not queued for
	      possible retransmission) and the index enters
	      the RFC-sent state.  The user should do an input
	      IOT which will wait for the OPN, CLS, FWD, or ANS reply
	      packet to arrive.  The NCP also copies and saves
	      the user-specified host number and window size.

	LSN - error if the index is not in the Closed state.
	      It is put into the Listen state.  The packet
	      is not transmitted, but it is saved so that
	      when an RFC comes in the system can compare
	      the contact names.  (Note- LSN is a special
	      opcode which is never actually transmitted
	      through the net.)  The pending-RFC list is searched
	      to see if an RFC with the same contact name has
	      been received.  If so, it is given to this index
	      as if it was received just after the LSN was
	      sent out.

	OPN - error if the connection is not in the RFC-received
	      state.  It is put into the Open state.  The
	      packet is transmitted (but not queued for
	      retransmission, since until it is received
	      the other end does not know what index to
	      send acknowledgements to.)  The system also
	      copies and remembers the window size.

	CLS - error if the connection is not in the Open
	      or the RFC-received state.  It is put into
	      the Closed state and the packet is transmitted
	      (but not queued for retransmission).  This packet
	      may optionally contain data bytes which are
	      an ascii excuse for the close.

	FWD - error if the connection is not in the RFC-received
	      state.  The packet is transmitted, but not queued
	      for retransmission, and the connection is put into
	      the Closed state.

	ANS - error if the connection is not in the RFC-received
	      state.  The packet is transmitted, but not queued
	      for retransmission, and the connection is put into
	      the Closed state.

	200 or higher - This is a data packet.  Error if the
	      connection is not in the Open state.  A packet#
	      is assigned, the destination and source fields
	      are filled in, and the packet is transmitted and
	      queued for retransmission.

	Any other opcode causes an error.

	In the case of an input IOT, the user will get an error
	if the connection is in the Closed or Broken state,
	except if it is in the Closed state and there are data
	packets queued.  This is so that the user can read the
	CLS packet.  Otherwise, it will hang until a packet
	arrives, then return the packet into the user's
	30.-word block.

	The user should check the sign bit of the first word,
	which will be set if this is a data packet.  The 
	non-data packets which can get given to the user are
	RFC, OPN, FWD, ANS, LOS, and CLS.  (You shouldn't be
	surprised if you get something else, though!)


CLOSE

	Immediately closes the connection.  All buffers and other
	information associated with the index are discarded.  Normally
	the user should first IOT a CLS
	packet containing an ascii explanation for why it is
	closing.  Note that any data previously written on the
	connection but not yet received by the other end will be
	lost.  User programs should exchange "bye" commands of some
	sort before closing if they care about losing data.  It is
	done this way to keep the NCP simple.


RESET

	Does nothing.


FORCE

	Does nothing.


FLUSH

	On an output channel, does FORCE and then waits until
	there are no queued output buffers.  I.e., waits for
	all output to be received and acknowledged by the foreign
	host.


RCHST

	val 1 	SIXBIT/CHAOS/
	val 2	0
	val 3	0
	val 4	0
	val 5	-1


RFNAME

	val 1 	SIXBIT/CHAOS/
	val 2	0
	val 3	0
	val 4	0
	val 5	0 or 1 ?


WHYINT

	val 1 - %WYCHA
	val 2 - state
	val 3 - number of packets queued (receive,,transmit)
	val 4 - window size (receive,,transmit)

	LH(val 3) is the number of packets available to input IOT.
		  This is different from the number of received packets
		  if some are out of order.
	RH(val 3) is the number of packets which have been transmitted
	by output IOT but which have not yet been received and
	acknowledged by the foreign host.

	The state codes are:

		%CSCLS	Closed
		%CSLSN	Listen
		%CSRFC	RFC-received
		%CSRFS	RFC-sent
		%CSOPN	Open
		%CSLOS	Broken by receipt of "LOS" packet.
		%CSINC	Broken by incomplete transmission (no acknowledge
			for a long time)


NETBLK

	Similar to Arpanet NETBLK.


STYNET

	This should work the same as on the Arpanet.  It will
	not be implemented initially, however.


CHAOSQ

	arg 1 - address of a 30.-word block (packet buffer)

	This is a special system call for use by the ATSIGN CHAOS
	program, which is a daemon program that gets run when
	an RFC is received that does not match up against an
	existing LSN.  

	The first packet on the pending-RFC queue is copied
	into the packet buffer, then moved to the end of the
	queue (so that the right thing happens when several
	RFC's are pending at the same time.)

	The call fails if the pending-RFC queue is empty.

	The program should use the contact name in this
	packet to choose a server program to execute.  This
	server program will then LSN to (presumably) the same
	contact name, thus picking up the RFC.

Interrupts

	IOC error interrupts occur if an attempt is made to IOT
	when the connection is in an improper state, or to IOT
	a packet with an illegal opcode.

	An I/O channel interrupt is signalled on the input channel
	when the number of queued buffers changes from zero to
	non-zero.

	An I/O channel interrupt is signalled on the output channel
	when the number of queued buffers changes from greater or
	equal to the window size, to less than the window size.

	An I/O channel interrupt is signalled on the input channel
	when the connection state changes.

	Interrupts can be used for

		(1) detecting when input arrives.
		(2) detecting when the system is willing to accept
		    output.
		(3) detecting when the other end does a CLOSE.
		(4) detecting when a requested connection
		    is accepted or rejected.
		(5) detecting when a request for connection
		    comes into a listening server.
---------- Material after this point may be inoperative ----------

>Comparison with LCSnet
, and other blathering.

>>Principle differences

The LCSnet proposed protocol  is called DSP.   The Chaosnet protocol  will
just be called chaos in this section.

(1) DSP specifies things in terms  of bytes where Chaosnet specifies  them
in terms of  packets.  We choose  packets to increase  the simplicity  and
efficiency of the scheme.  DSP  has to work in  terms of bytes because  it
allows packets to be reformatted en route, hence

(2) DSP assumes  that gateways can  exist between networks  with the  same
protocols but  different packet  sizes.  Therefore,  the protocol  has  to
allow for the fact that packets may be reformatted en route.  I happen  to
believe that this situation  is extremely unlikely to  exist, and in  fact
gateways between "different" networks will have to do much more than  just
change the packet  size.  Therefore, it  makes sense to  make the  gateway
worry about  gateway issues,  rather  than have  them permeate  the  whole
protocol.  I believe  that gateways  will look more  like regular  network
ports than like transmission media; to get from a host on the local net to
a host on the arpa net, one will  connect to the arpa net gateway and  ask
it to open a connection from itself to the host on the arpa net, then  tie
those connections  together.   A  gateway will  perform  not  only  packet
reformatting, but protocol  translation, flow control  on both sides,  and
maybe even character set translation.   There can also be entities  called
"bridges", which connect  two networks  (or two separate  segments of  one
network) with the same protocol.  A bridge simply forwards any packets  it
receives, but never alters the packets, and never looks inside them except
to find out where to forward them to.

(3) A related difference is that DSP  includes the arpa net, and TCP,  and
by extension all the networks in the universe, in its port number  address
space.  Chaosnet  would have  you  connect to  a  gateway, then  send  the
gateway the  port  number to  connect  to  in the  foreign  address  space
separately.  Chaosnet  does  have to  include  all networks  reachable  by
bridges in its address space.

(4) Chaosnet has an  "opcode" field in the  packet header, where DSP  does
not.  DSP acheives the same effect  with various bits here and there.   It
makes little difference unless user-level  programs decide to exploit  the
opcode field.

(5)  DSP  and  Chaosnet  have  quite  different  mechanisms  for  creating
connections.  In DSP, one never creates a connection, exactly; one  simply
starts sending to a port address.  Local network note #3 mumbles about how
one discovers which  port address  to send  to, but  I have  not seen  any
specifics.  In Chaosnet, the  mechanism for finding out  where to send  to
and the mechanism for creating a connection are intertwined; the excuse is
that often the process being connected to  is created at the same time  as
the connection.

(6) DSP  uses unique,  never-reused  port IDs.   Chaosnet does  not.   The
problem with unique, never-reused IDs is that I know of no system that can
implement them.  Multics comes close, with  the aid of a special  hardware
clock.  The clock  is set  from the operator's  watch when  the system  is
powered on, and the mechanism  depends on the fact  that the error in  the
operator's watch is  less then the  time required to  bring up the  system
after a power failure.  Small systems that cannot afford special  hardware
just for this, and don't have  permanent disk storage, would find it  very
hard to generate unique IDs.

Chaosnet prefers  to  rely  on  a  scheme  that  doesn't  require  special
hardware, but nearly always works.  By requiring a connection to be opened
before data packets can be sent through it, and by some assumptions  about
the structure of the network, the problem is eliminated.  See the Flow and
Error Control section for further discussion.

(7) DSP closes the two directions of a connection separately.  Why?

>>High priority data packets, interrupts, and flushing.

The basic  idea is  to note  that  if you  want to  send a  high  priority
message, this means you want it  out of order with respect to  previously-
sent data on  some connection.   Therefore, high priority  data should  be
sent over an  auxiliary connection.   The per-connection  overhead is  not
prohibitively high, and this eliminates  vast quantities of hair from  the
innermost portion of the system.

One advantage that DSP gains by having "high priority messages" built into
the system is  that it also  incorporates a standardized  way to "mark"  a
particular point  in  a  data  stream.   However,  this  is  comparatively
unimportant,  particularly  since  I  think  high-priority  messages  will
probably never get used.   The only place I've  heard them proposed to  be
used is with Telnet, but ordinary  terminals get along quite well  without
"out of band" signals when used with reasonable operating system software.

Interrupts and flushing of  input are random  crocks associated with  high
priority messages.  I don't propose to implement them either.

>>Datagrams.  (connections only used to pass a single packet.)

These are easy.  The guy who wishes to send a datagram does OPEN, IOTs  an
RFC to the service to  which the gram is to  be sent, and NETBLKs  waiting
for the connection  to open  up.  He then  IOTs the  data packet,  FLUSHes
waiting for it to get there, then CLOSEs.

The server OPENs and IOTs an OPN in response to the RFC.  She then IOTs in
the datagram packet, CLOSEs, and goes off processing the message.

Four packets are transmitted, two in  each direction.  (An RFC, an OPN,  a
DATA, and an ACKing NOP.)   No need to send  any CLS messages, since  each
user process knows to  do a CLOSE  system call after  one data packet  has
been transmitted.   It  has been  claimed  that this  is  the  theoretical
minimum if  acknowledgement is  required.   The reason  is that  the  data
packet must contain  some unique  id generated  by the  RECEIVER to  avoid
duplicates, and it  must be acknowledged,  so that's two  packets in  each
direction, with no combining possible.

Note that as [someone at] PARC has pointed out, for the important case  of
side-effect-free   transactions,    a   timeout    can   substitute    for
acknowledgement, and only two packets are necessary.  See ANS.


>>Why not multiple messages per packet?

[1] Not needed for data.  The division of the data stream into packets  is
invisble to the  real user,  anyway.  It's  only used  by the  "user-ring"
portion of the network system software.

[2] Extra  complexity.  Consider  the hair  involved with  packed  control
messages in the Arpanet.  Because of the control link being shared between
multiple connections between  the same pair  of hosts, this  could save  a
little.  I don't know of any NCP that does this; furthermore, having  that
shared facility is a bad idea.  The only case in the Arpanet where packing
two control  messages  into  one  packet  is  useful  is  when  opening  a
connection the receiver wants to send STR and ALL both.  In this  protocol
we just put the window size in as part of the RFC and OPN messages.

[3] There  is an  argument that  having message  boundaries separate  from
packet boundaries is  useful because gateways  between different  networks
may need to split up packets  because the two networks may have  different
maximum packet sizes.  My feeling about this is that the gateway is likely
to have to do a good deal more than that.  It seems like too much to  wish
for that  the two  networks should  use exactly  the same  packet  format,
protocols, or  even character  set; so  the gateway  rather than  being  a
packet-reformatting device is much more likely to look like a user program
with two  connections, one  on one  network and  one on  the other,  which
passes data  between  the  connections with  appropriate  conversion.   In
particular, flow control  is likely to  be host1 to  gateway and host2  to
gateway, rather than host1 to host2.


>>Why not have a checksum in the packet?

This network is likely  to have a very  diverse collection of machines  on
it, which means it will  be impossible to define  a checksum which can  be
computed efficiently in software on all machines.  Now all hardware  links
in the  system ought  to  have whatever  amount  of hardware  checking  is
appropriate to them, but due to  the efficiency costs of a user-level  end
to end checksum, this  should not be a  built-in requirement of the  basic
low-level protocol.  Instead, checksumming  should be an optional  feature
which some higher-level  protocols (those  that need it  because the  data
being passed  through them  is so  vitally important  that every  possible
effort  must  be   made  to   ensure  its   correctness)  may   implement.
Checksumming should be implemented at the  user level in exactly the  same
way and for exactly the same  reasons as encryption should be  implemented
at the user level.


>>How about host-independent user-level protocols, where one just connects
to a service and doesn't have to know what host it's at today?

Yeah, how about it?  As far as I know, this protocol provides an  adequate
base for  constructing  such  a  thing.   Also  I  haven't  seen  anything
published on the subject.


>>Very small hosts.

E.g. we'd like to put  the Chess machine on the  net.  It has very  little
memory, but  not  totally impotent  microcode.   A small  host  need  only
support one connection, may  ignore WIN, LOS, and  CLS, may only have  one
packet in transmission at a time, and may process receive packets one at a
time (ignoring  any  that come  in  until the  first  one has  been  fully
processed).  It IS necessary to check  that received DATA packets come  in
in the right order.

RFC may be handled by remembering  the other guy's host number and  index,
and sending back a  completely canned OPN.  The  contact name is  ignored.
If a second user  tries to connect  while a first  user is connected,  the
first user gets bumped.  Let them fight it out on some larger machine  (or
the floor) for who will get to use the small machine.  Never originate any
packet type other than DATA and that one OPN.

Attaching ordinary  terminals  "directly"  to  the  network  is  obviously
undesirable.
>Transmission Media

This section  describes  how  packets are  encapsulated  for  transmission
through various media, and what auxiliary hair is needed by each medium.


>>Ethernet

The messages transmitted  through the ether  (or CAIOS) net  consist of  a
packet followed by a three-word trailer:

	+----------------+
	|     header     |  8 words
	+----------------+
	|      data      |  0-52 words
	+----------------+
	| immediate dest |  1 word
	+----------------+
	| immediate src  |  1 word
	+----------------+
	| CRC check word |  1 word
	+----------------+

The three trailer words  are looked at  by the hardware;  the last two  of
them are supplied by the hardware.  The reason this stuff is in a  trailer
rather than a leader is that the caiosnet hardware actually transmits  the
packet backwards.  However, this is transparent to the software.

Bytes are  sent  two  per  word.   The  low-order  byte  is  first  (pdp11
standard).


>>TEN11 Interface

[The following total hair has not been checked recently.]

Since interrupts can't be sent through the TEN11 interface, the pdp10  can
only service the network at relatively-infrequent intervals, for  instance
every 1/60'th of a  second.  Therefore it is  necessary to have queues  of
packet buffers in each  direction.  This provides  high speed by  allowing
several packets to be processed at a time.

The speed and reliability of the  TEN11 interface eliminates any need  for
error checking.  (ha ha) [ho  ho] <he he> To decrease  the load on the  AI
pdp10, it is assumed that the pdp11's will be responsible for swapping the
bytes in  the data  portions of  packets so  that they  will be  in  pdp10
standard order.

Even though the contents of packets  will not be error-checked, the  pdp10
must check addresses to avoid being screwed by a losing pdp11.

The form of a packet encapsulated for the TEN11 interface will then be

	|-----------------|-----------------|----|
	|  queue thread   | 0=empty, 1=full |  0 |
	|-----------------|-----------------|----|
	| #bytes | opcode |     unused      |  0 |
	|-----------------|-----------------|----|
	|destination host |destination index|  0 |
	|-----------------|-----------------|----|
	|   source host   |   source index  |  0 |
	|-----------------|-----------------|----|
	|    packet #     |   ack packet #  |  0 |
	|-----------------|-----------------|----|
	| data 0 | data 1 | data 2  . . .   |  0 |
	|				    |  0 |
	|-----------------|-----------------|----|

for a total of 31 36-bit words, or 62 pdp11 words.

The queue  thread is  the pdp11  address of  the next  packet-buffer in  a
queue, or zero if this is the last.  The empty/full indicator says whether
this buffer currently contains a packet, or not.

The following is  an attempt to  express the algorithms  of the pdp10  and
pdp11 in concise form.  Hopefully they are self- explanatory.

Several queues of buffers need to exist  in the pdp11.  Only two of  these
are known to the pdp10.

TO10QF - first buffer queued for transmission to the 10.

TO10QL - last buffer  queued for transmission to  the 10.  Exists so  that
buffers can be appended to the list more quickly.

TO10AC - first buffer in list of buffers actively being gobbled by the 10.
Set by 11, cleared by 10.

TO10FR - copy of TO10AC.   Used to free the  buffers after the 10  gobbles
them.

(come-from pdp11-packet-receivers
      when (eq (destination-host ?packet) pdp10)
       ;re-arrange 8-bit bytes for 10
	(swap-bytes (data-part packet))
       ;Add to to-10 queue
	(set (thread packet) nil)	;nil=0
	(cond ((null TO10QF)
	       (setq TO10QF packet TO10QL packet))
	      (t (set (thread TO10QL) packet)
		 (setq TO10QL packet)))
       ;Try to activate to-10 queue
	(cond ((null TO10FR)
	       (setq TO10FR TO10QF
		     TO10AC TO10QF
		     TO10QF nil
		     TO10QL nil)))
)

(come-from pdp11-polling-loop
      when (and (not (null TO10FR))  ;buffers were sent to 10
		(null TO10AC))	     ;and 10 has finished gobbling
	(mapcar 'make-buffer-free TO10FR)	;mapcar follows queue words
	(setq TO10FR nil)	     ;OK to activate more buffers now
	(cond ((not (null TO10QF))   ; more stuff waiting, activate it now
	       (setq TO10FR TO10QF
		     TO10AC TO10QF
		     TO10QF nil
		     TO10QL nil)))
)

(come-from pdp10-clock-level
      when (and (not (null TO10AC))  ;11 is sending buffers
		(not web-buffers-locked))
       ;copy to user, process control message, or reject if buffers full
	(mapcar 'process-input
		TO10AC)
       ;signal pdp11 that all packets have been gobbled
	(setq TO10AC nil))


FRBFLS - list of free buffers.  cons-buffer gets from here,
	 make-buffer-free puts here.

FM10AC - list of buffers into which pdp10 may place packets
	 set by 11 / cleared by 10.

FM10GB - copy of FM10AC, used by 11 to process buffers after
	10 has placed packets into them.

(come-from pdp11-polling-loop
      when (and (null FM10GB)	     ;10 needs some buffers &
		(not (null FRBFLS))) ; free buffers available
       ;give the 10 a list of a suitable number of empty buffers
	(repeat max-at-a-whack times
	  (and (null FRBFLS) (exitloop))
          (setq buffer (cons-buffer)) ;pull off free list
	  (set (full-indicator buffer) nil) ;contains no packet
	  (set (thread buffer) FM10GB) ;cons onto list
	  (setq FM10GB buffer))
	(setq FM10AC FM10GB)	     ;give buffer list to 10
)

(come-from pdp11-polling-loop
      when (and (not (null FM10GB))  ;gave 10 some buffers
		(null FM10AC))	     ;which it has used
       ;process packets sent from 10.
	(mapcar
	 '(lambda (buffer)
	    (cond ((null (full-indicator buffer))
		   (make-buffer-free buffer)) ;didn't get used
		  (t (swap-bytes buffer)
		     (send-to-destination buffer))))
	 FM10GB)
	(setq FM10GB nil))	     ;no buffers active in 10 now

(come-from pdp10-clock-level
      when (and (not (null FM10AC))  ;buffer space avail
		(not web-buffers-locked)) ;no M.P. interference
       ;send as many packets as possible
	(mapcar
	 '(lambda (buffer)
	    (cond ((needs-to-be-sent ?packet)	;find a sendable packet somewhere
		   (copy-into buffer packet)
		   (set (full-indicator buffer) t))))
	 FM10AC)
       ;signal pdp11 to gobble the packets
	(setq FM10AC nil))


To avoid the  need for  a gross  number of exec  pages in  the pdp10,  the
FM10AC and  TO10AC words,  and all  the packet  buffers, should  lie in  a
single 4K page.  The  best location for this  page varies from machine  to
machine.  On dedicated 11's such as the  AI TV11, the MC console 11,  etc.
it should probably just be the first  4K of memory.  On the logo  machine,
it would probably be  best to put  this page up in  high memory where  RUG
won't mess with it.  In the case of the mini-robot system, I'm not sure.

It would be best not  to try to use  this protocol with "general  purpose"
machines, because of  problems with  finding the list  headers and  packet
buffers, problems with telling  whether the machine  is running the  right
program, etc.  It should be used just as a way for the AI machine to get a
path to the net.
>>DL10 & DTE20

[Outline only]

[Just use the pdp11 as a substitute for a direct chaosnet interface.]

[Basically, the 11 says  ready to transfer (in  either direction), the  10
sets up the pointers and says to transfer, and the 11 transfers the cruft.
To eliminate an extra level of buffering, on input transfers the 11  makes
a copy of the first 4 16-bit words of the header available to the 10  when
it first says "ready to transfer."  The  10 uses these to decide where  to
copy the packet  into.  It  helps if  you don't  try to  use a  DL10 on  a
machine with a cache.]
>>Asynchronous line

Packets are encapsulated by preceding them  with start of text (202),  and
following them with a 1-byte additive  checksum and an end of text  (203).
The 16-bit words are transmitted low order byte first.  If the checksum is
wrong the receiver ignores the packet.   The start and end characters  are
just there to help in  ignoring random noise on  the line.  If they  don't
appear,  the  packet  is  ignored.    The  full  60-word  packet  is   not
transmitted; the #bytes count  is used to determine  how many of the  data
bytes to transmit; the receiver fills the remainder with zero (or garbage,
at its convenience.)

This protocol  is intended  mainly for  communication between  the  plasma
physics pdp11 in  bldg. 38 and  a pdp11  in 545, until  the caiosnet  gets
extended that far (or a longer-distance, lower-speed caiosnet is  extended
to various machines off in that direction.)
>Higher-Level Protocols

>>Telnet

Essentially this  should follow  "New  Supdup".  [Unresolved  issues  with
respect to output reset remain.]

>>File-Access

[*****]

>>Mail

[*****]

>>Locate-named-service

[*****]